A tree-to-tree model for statistical machine translation

نویسنده

  • Brooke Cowan
چکیده

In this thesis, we take a statistical tree-to-tree approach to solving the problem of machine translation (MT). In a statistical tree-to-tree approach, first the source-language input is parsed into a syntactic tree structure; then the source-language tree is mapped to a targetlanguage tree. This kind of approach has several advantages. For one, parsing the input generates valuable information about its meaning. In addition, the mapping from a sourcelanguage tree to a target-language tree offers a mechanism for preserving the meaning of the input. Finally, producing a target-language tree helps to ensure the grammaticality of the output. A main focus of this thesis is to develop a statistical tree-to-tree mapping algorithm. Our solution involves a novel representation called an aligned extended projection, or AEP. The AEP, inspired by ideas in linguistic theory related to tree-adjoining grammars, is a parsetree like structure that models clause-level phenomena such as verbal argument structure and lexical word-order. The AEP also contains alignment information that links the sourcelanguage input to the target-language output. Instead of learning a mapping from a sourcelanguage tree to a target-language tree, the AEP-based approach learns a mapping from a source-language tree to a target-language AEP. The AEP is a complex structure, and learning a mapping from parse trees to AEPs presents a challenging machine learning problem. In this thesis, we use a linear structured prediction model to solve this learning problem. A human evaluation of the AEP-based translation approach in a German-to-English task shows significant improvements in the grammaticality of translations. This thesis also presents a statistical parser for Spanish that could be used as part of a Spanish/English translation system. Thesis Supervisor: Michael J. Collins Title: Associate Professor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation

The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases, where a tree sequence is a contiguous sequence of subtrees. This paper goes further to present a translation model based on non-contiguous tree sequence alignment, where a non-contiguous tree sequence is a sequence of sub-trees and gaps. Compared with the contiguous...

متن کامل

A Discriminative Syntactic Model for Source Permutation via Tree Transduction

A major challenge in statistical machine translation is mitigating the word order differences between source and target strings. While reordering and lexical translation choices are often conducted in tandem, source string permutation prior to translation is attractive for studying reordering using hierarchical and syntactic structure. This work contributes an approach for learning source strin...

متن کامل

TSUKU Statistical Machine Translation System for the NTCIR-10 PatentMT Task

This paper describes details of the TSUKU machine translation system in the NTCIR-10 PatentMT task [8] . This system is an implementation of our tree-to-string statistical machine translation model that combines a context-free grammar (CFG) parse tree and a dependency parse tree.

متن کامل

Forest-based Tree Sequence to String Translation Model

This paper proposes a forest-based tree sequence to string translation model for syntaxbased statistical machine translation, which automatically learns tree sequence to string translation rules from word-aligned sourceside-parsed bilingual texts. The proposed model leverages on the strengths of both tree sequence-based and forest-based translation models. Therefore, it can not only utilize for...

متن کامل

Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. I propose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-su...

متن کامل

Akamon: An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation

We describe Akamon, an open source toolkit for tree and forest-based statistical machine translation (Liu et al., 2006; Mi et al., 2008; Mi and Huang, 2008). Akamon implements all of the algorithms required for tree/forestto-string decoding using tree-to-string translation rules: multiple-thread forest-based decoding, n-gram language model integration, beamand cube-pruning, k-best hypotheses ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008